PLSC30500, Fall 2024

Part 1. Probability Theory (part b)

Andy Eggers

Random variables

Random variables

Recall a random generative process produces an outcome \(\omega \in \Omega\); events (e.g. \(A\), \(B\)) are sets of outcomes.

A random variable \(X\) is a function that maps each outcome \(\omega\) to an event expressed as a number. “Random variables are real-valued functions of outcomes” (A&M p. 38)

e.g. \(X(\omega) = 1\) means the outcome \(\omega\) is part of the event 1. (NB: Events are now numbers.)

Usually we just write \(X = 1\) and \(\text{Pr}[X = 1]\). (cf \(P(A)\) for events)

Can also think of \(X\) as a random process that produces numbers as outcomes, but A&M’s way distinguishes the random process itself from the researcher’s representation of it in numbers.

Random variables: Blitzstein and Hwang figure

Random variables: another Blitzstein and Hwang figure

“Two random variables defined on the same sample space”

Why though?

Many outcomes/events of interest can be quantified:

  • \(X = 1\) for Yes/Happened, \(X = 0\) for No/Did not happen
  • \(X\) is e.g. number of casualties

This is much easier to work with than

  • event \(A\): it happened; event \(A^C\) (or \(B\)): it did not happen
  • event \(A\): 0 casualties; event \(B\) 1 casualty,

Blitzstein and Hwang: “Random variables provide numerical summaries of the experiment in question.”

Functions/operators of random variables

Since a random variable \(X\) produces numbers, we can apply functions: e.g. \(X^2\), \(\sqrt{X}\), generically \(g(X)\)

This gives us a new number for each \(\omega\) – a new random variable.


We also want to describe a random variable using operators: e.g. \(E[X]\) (expectation), \(V[X]\) (variance).

This gives us a number to describe \(X\)not a new random variable. (We may estimate these from samples, producing RVs, e.g. sample mean, sample variance.)

Blitzstein and Hwang figure

Discrete random variables and the PMF

A random variable \(X\) is discrete if its range \(X(\Omega)\) is a countable set. For example, \(\{1,2,3\}\), \(\{1,2,3, \ldots\}.\)

A discrete RV has a probability mass function (PMF): \[f(x) = \text{Pr}[X = x], \forall x \in \mathbb{R}.\]

For example, the number of heads in two flips of a fair coin:

\[ f(x) = \begin{cases} 1/4 & x = 0 \\\ 1/2 & x = 1 \\\ 1/4 & x = 2 \\\ 0 & \text{otherwise} \end{cases} \]

The CDF

The cumulative distribution function (CDF) of a random variable \(X\) is

\[ F(x) = \text{Pr}[X \leq x], \forall x \in \mathbb{R}\] The CDF is another way to fully describe a random variable.

For the coin flip example,

\[ F(x) = \begin{cases} 0 & x < 0 \\\ 1/4 & 0 \le x < 1 \\\ 3/4 & 1 \le x < 2 \\\ 1 & x \ge 2 \end{cases} \]

The PMF and CDF for number of heads in two coinflips

Continuous random variables

If a random variable could take on a continuum of values (i.e. \(X(\Omega)\) includes some interval of the real line), then we say it is continuous. Examples?

Integrating PDF to get probability of event in interval

Probability density function (PDF) written \(f(x)\), CDF written \(F(x)\).

CDF at \(x\) is integral of PDF below \(x\):

\[F(x) = \text{Pr}[X \leq x] = \int_{-\infty}^x f(u) du\]

\[\text{Pr}[a \leq X \leq b] = \int_{a}^b f(u) du = F(b) - F(a)\]

Integration visually

Uniform random variable

Bivariate relationships

A single random process can produce two random variables \(X(\omega)\) and \(Y(\omega)\).

  • rolling a die: \(X\) is number on die, \(Y\) is 1 if roll 3 or higher
  • sampling a respondent: \(X\) is age of respondent, \(Y\) is 1 if respondent voted

Describing bivariate relationships

We can describe two random variables \(X\) and \(Y\) with

  • joint PMF:

\[ f(x,y) = \textrm{P}[X=x, Y=y], \forall x, y \in \mathbb{R} \]

  • joint CDF:

\[ F(x,y) = \textrm{P}[X \leq x, Y \leq y], \forall x, y \in \mathbb{R} \]

Writing the PMF: table format

Let \(X\) denote number of heads in two tosses of a fair coin.

Let \(Y\) denote number of heads in one toss of a fair coin.

x y Pr[X = x, Y = y]
0 0 1/8
0 1 1/8
1 0 1/4
1 1 1/4
2 0 1/8
2 1 1/8

Writing a PMF: “cases” format

Let \(X\) denote number of heads in two tosses of a fair coin.

Let \(Y\) denote number of heads in one toss of a fair coin.

\[ f(x, y) = \begin{cases} 1/8 & x = 0, y = 0 \\\ 1/8 & x = 0, y = 1 \\\ 1/4 & x = 1, y = 0 \\\ 1/4 & x = 1, y = 1 \\\ 1/8 & x = 2, y = 0 \\\ 1/8 & x = 2, y = 1 \\\ 0 & \text{otherwise} \end{cases} \]

Writing a joint PMF: “\(X\)-by-\(Y\)” format

Let \(X\) denote number of heads in two tosses of a fair coin.

Let \(Y\) denote number of heads in one toss of a fair coin.

Then \(\text{Pr}[X = x, Y = y]\) is given by this table:

y = 0 y = 1
x = 0 1/8 1/8
x = 1 1/4 1/4
x = 2 1/8 1/8

Graphical representation of joint PMF

Marginal PMF

Recall: A joint PMF \(f(x, y) = \text{Pr}[X = x, Y = y]\) describes the distribution of two discrete RVs \(X\) and \(Y\).

We can also talk about the marginal PMF of one of the variables:

\[f_Y(y) = \text{Pr}[Y = y] = \sum_{x \in \text{Supp}[X]} f(x, y), \forall y \in \mathbb{R}.\]

Basically, this describes the distribution of \(Y\) ignoring \(X\).

This is an application of the Law of Total Probability.

Marginal PMF (2)

With the \(X\)-by-\(Y\) representation of joint PMF, the marginal PMF of \(X\) is the row sums, marginal PMF of \(Y\) is column sums (written in the margins):

y = 0 y = 1
x = 0 1/8 1/8 1/4
x = 1 1/4 1/4 1/2
x = 2 1/8 1/8 1/4
1/2 1/2

Marginal PMF (3)

With the graphical representation, think about sweeping the mass to the axis:

Conditional PMF

For RVs \(X\) and \(Y\), we can also talk about the conditional PMF of \(Y\) at a value of \(X\):

\[f_{Y|X}(y|x) = \text{Pr}[Y = y \mid X = x] = \frac{\text{Pr}[X = x, Y = y]}{\text{Pr}[X = x]} = \frac{f(x, y)}{f_X(x)}\] \(\forall y \in \mathbb{R}\) and \(\forall x \in \text{Supp}[X]\).

  • Just like conditional probability for events
  • Intuitively, take the joint probabilities where \(X = x\) and scale them up by \(\text{Pr}[X = x]\)

Conditional PMF (2)

With the \(X\)-by-\(Y\) representation of joint PMF, you get the conditional PMF of \(Y\) given \(X = x\) by dividing each row by the row sum (i.e. the marginal probability of \(X = x\)):

\(f(x, y)\)

y = 0 y = 1
x = 0 1/8 1/8
x = 1 1/4 1/4
x = 2 1/8 1/8

\(f_{Y|X}(y |x )\)

y = 0 y = 1
x = 0 1/2 1/2
x = 1 1/2 1/2
x = 2 1/2 1/2

Conditional PMF (3)

With the graphical representation, think about taking a slice of the PMF and rescaling it:

Jointly continuous random variables

Joint probability density function (PDF):

Jointly continuous random variables (2)

Another joint PDF:

Marginal PDF in the continuous case

\[f_Y(y) = \int_{-\infty}^{\infty}f(x,y) dx, \forall y \in \mathbb{R}\]

To get \(f_Y(y)\) for a specific \(y\), slice the joint pdf at \(Y = y\), and integrate (sum) \(f(x, y)\) across all values of \(x\).

To get \(f_Y(y)\) for all \(y\)s, think about squishing the PDF into the \(y\)-axis.

Conditional PDF in the continuous case

\[f_{Y \mid X}(y \mid x) = \frac{f(x,y)}{f(x)}, \forall y \in \mathbb{R} \, \text{and} \, \forall x \in \mathrm{Supp}[X]\]

For a specific \(x\), what does \(f(x,y)\) look like? What does \(f(x)\) look like?

Conditional PDF (2)

\(f(x, y)\) for \(x=1\) is the intersection between the joint pdf and the plane \(x = 1\):

And \(f(x)\) is the integral of that intersection: numerically I compute it to be about 0.24.

Conditional PDF (3)

Conditional PDF (4)

Above is \(f_{Y \mid X} (y \mid x)\) for one value of \(x\). Here it is for all \(x\):

Conditional PDF (5)

Questions that are best answered with

Joint dist. \(f(x,y)\):

  • how common are tall, smart people?
  • in a particular industry, are there more women CEOs or men CEOs?
  • what proportion of crimes are committed by young women?

Conditional dist. \(f_{Y|X}(y | x)\):

  • how smart are tall people? how tall are smart people?
  • in a particular industry, is the proportion of women who are CEOs higher than the proportion of men who are CEOs?
  • of crimes committed by women, what proportion are committed by young women?

Can be very important to distinguish between the two!

Which conditional probability?

Researchers often get confused about which conditional probability they are handling, \(\Pr[Y \mid X]\) or \(\Pr[X \mid Y]\).

  • \(\Pr[ \text{data} \mid H_0 \textrm{ is true}]\) is (roughly) the p-value; \(\Pr[ H_0 \textrm{ is true} \mid \text{data}]\) is something else
  • Johnson et al (2019) use data on shootings to measure \[\Pr(\text{minority civilian} \mid \text{shot, white officer, } X) - \\ \Pr(\text{minority civilian} \mid \text{shot, minority officer, } X),\] but (as pointed out by Knox and Mummolo) interpret results as if it was \[\Pr(\text{shot} \mid \text{minority civilian, white officer, } X) - \\ \Pr(\text{shot} \mid \text{minority civilian, minority officer, } X)\]

\(\Pr[Y \mid X]\) or \(\Pr[X \mid Y]\) are connected by Bayes Rule.

Independence of random variables

Random variables \(X\) and \(Y\) are independent if, \(\forall x, y \in \mathbb{R}\):

\[f(x, y) = f_X(x) f_Y(y)\]

Equivalently,

\(X\) and \(Y\) are independent if the conditional distribution of \(X\) at every \(y\) is the same as the marginal distribution of \(X\), i.e. if \(f_{X | Y}(x | y) = f_X(x)\) \(\forall x \in \mathbb{R}\) and \(\forall y \in \text{Supp}[Y]\)

And vice versa.